In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference.
Designing for high performance requires considering the restrictions of the memory hierarchy, i.e. the size and capabilities of each component. Each of the various components can be viewed as part of a hierarchy of memories in which each member is typically smaller and faster than the next highest member of the hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer and then signaling for activating the transfer.
There are four major storage levels.
| +Cache, memory, and external storage hierarchy of a 2020s computer system (AMD Zen 4) | |||||
| All CPU-related conversion assumes a 4.0 GHz clock. Same for below. Full utilization of throughput is impossible on real workloads. Size is provided for each core. | |||||
| CPU cache | Hardware prefetching is required for maximum throughput. Size and throughput are per-core. Code cache has the same size but is not manipulable as data. | ||||
| Size and throughput are per-core. | |||||
| Size is shared among 8 cores. Throughput is per-core. | |||||
| Main memory (primary storage) | Size is shared among all cores. Latency depends on the memory clock and memory timings. In this case, a result from a pair of 32 GB DDR5 DIMMs set to 6000 MT/s via the factory EXPO profile is used.
Systems with multiple CPU sockets have an additional NUMA delay when a CPU tries to access memory under the control of another NUMA node. | ||||
| Mass storage (secondary) | Solid-state drive | 2 TB | 2000 MB/s | 0.2 ms | Figures for a M.2 NVMe SSD from 2017, the Samsung 960 Pro. |
| Hard disk drive | 18 TB | 500 MB/s | 4.16 ms | Per-drive figures for Exos 2X18 (ST18000NM0092), an enterprise-grade 3.5 inch SATA HDD. | |
| Nearline storage (tertiary storage) | Spun-down HDDs (MAID) | Petabytes | 25 s | Per-drive figures for Exos 2X18 (ST18000NM0092), from user manual entry for "start/stop times". In a typical MAID setup, hundreds of spun-down HDDs may be used for petabytes of storage. | |
| Tape library | Exabytes | 160 MB/s | Minutes | ||
| Offline storage | Exabytes | Depends on medium | Depends on human operation | ||
Some CPUs include additional levels of cache between L3 and memory. For example, the Haswell microarchitecture includes an L4 cache of 128 MB on mobile units.
The lower levels of the hierarchyfrom mass storage downwardsare also known as tiered storage. The formal distinction between online, nearline, and offline storage is:
For example, always-on spinning disks are online, while spinning disks that spin down, such as massive arrays of idle disk (MAID), are nearline. Removable media such as tape cartridges that can be automatically loaded, as in a tape library, are nearline, while cartridges that must be manually loaded are offline.
Modern programming languages mainly assume two levels of memory, main ( working) memory and mass storage. The exception is the relatively low-level assembly language and in the of higher-level languages such as C where "prefetch" instructions can be used to preload the cache. Taking optimal advantage of the memory hierarchy requires the cooperation of programmers, hardware, and compilers (as well as underlying support from the operating system):
Many programmers assume one level of memory. This works fine until the application hits a performance wall. At that point, the programmer needs to change the code's memory access patterns to that it works well with cache resources. A classic illustration of the effect of locality and caching is in the form of changing the order of iterating a three-dimensional array. Computer Systems: A Programmer's Perspective is a classic textbook that deals with this aspect of systems programming.
Memory tiering is implemented on Linux as an extension to NUMA, where each memory provider has a CPU-less NUMA node with an appropriate "abstract distance" reflecting its performance. The existing scheme for migrating memory between NUMA nodes using "hotness" indicated by is adapted to tiering by Huang Ying (Al Maruf's TPP scheme is not in Linux mainline). It also uses a weighted-interleave allocation policy.
|
|